AITopics | adaptive gradient

Collaborating Authors

adaptive gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Neural Information Processing SystemsDec-24-2025, 02:21:39 GMT

Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using some specific adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrate the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our SUPER-ADAM algorithm can achieve the best known gradient (i.e., stochastic first-order oracle (SFO)) complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms.

faster and universal framework, name change, super-adam, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Active Hypothesis Testing under Limited Information

Fabio Cecchi, Nidhi Hegde

Neural Information Processing SystemsNov-21-2025, 12:03:05 GMT

Our objective is to infer the true hypothesis with low error, while minimizing the number of action sampled.

artificial intelligence, hypothesis, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.42)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 17:51:25 GMT

"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","1527" "Title:","Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper considers asynchronous parallel updates in stochastic gradient descent with delays. This is a very important problem in large-scale distributed data processing. The objective of the problem studied in this paper is to achieve regret bounds similar to the ones obtained by adaptive gradient (i.e. This boils down to keeping track of updates to gradient coordinates.

algorithm, author feedback and meta-review, cc paperinformation reviewerinstruction, (9 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.25)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

Add feedback

Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization

Coppola, Corrado, Papa, Lorenzo, Amerini, Irene, Palagi, Laura

arXiv.org Artificial IntelligenceDec-16-2024

Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show significant improvements, including a decrease in the overall training time by up to 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.

artificial intelligence, fast-controlled minibatch algorithm, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2411.15795

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

Neural Information Processing SystemsOct-10-2024, 06:45:42 GMT

adaptive gradient, faster and universal framework, super-adam, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adaptive Active Hypothesis Testing under Limited Information

Fabio Cecchi, Nidhi Hegde

Neural Information Processing SystemsOct-4-2024, 00:30:42 GMT

We consider the problem of active sequential hypothesis testing where a Bayesian decision maker must infer the true hypothesis from a set of hypotheses. The decision maker may choose for a set of actions, where the outcome of an action is corrupted by independent noise. In this paper we consider a special case where the decision maker has limited knowledge about the distribution of observations for each action, in that only a binary value is observed. Our objective is to infer the true hypothesis with low error, while minimizing the number of action sampled. Our main results include the derivation of a lower bound on sample size for our system under limited knowledge and the design of an active learning policy that matches this lower bound and outperforms similar known algorithms.

algorithm, hypothesis, true hypothesis, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.63)

Add feedback

Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods

O'Leary-Roseberry, Thomas, Bollapragada, Raghu

arXiv.org Machine LearningAug-13-2024

We consider minimizing finite-sum and expectation objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent work (Na et al. 2023) demonstrated that Hessian averaging can be utilized to achieve fast $\mathcal{O}\left(\sqrt{\tfrac{\log k}{k}}\right)$ local superlinear convergence for strongly convex functions in high probability, while maintaining fixed per-iteration Hessian costs. These methods, however, require gradient exactness and strong convexity, which poses challenges for their practical implementation. To address this concern we consider Hessian-averaged methods that allow gradient inexactness via norm condition based adaptive-sampling strategies. For the finite-sum problem we utilize deterministic sampling techniques which lead to global linear and sublinear convergence rates for strongly convex and nonconvex functions respectively. In this setting we are able to derive an improved deterministic local superlinear convergence rate of $\mathcal{O}\left(\tfrac{1}{k}\right)$. For the %expected risk expectation problem we utilize stochastic sampling techniques, and derive global linear and sublinear rates for strongly convex and nonconvex functions, as well as a $\mathcal{O}\left(\tfrac{1}{\sqrt{k}}\right)$ local superlinear convergence rate, all in expectation. We present novel analysis techniques that differ from the previous probabilistic results. Additionally, we propose scalable and efficient variations of these methods via diagonal approximations and derive the novel diagonally-averaged Newton (Dan) method for large-scale problems. Our numerical results demonstrate that the Hessian averaging not only helps with convergence, but can lead to state-of-the-art performance on difficult problems such as CIFAR100 classification with ResNets.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2408.07268

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adap DP-FL: Differentially Private Federated Learning with Adaptive Noise

Fu, Jie, Chen, Zhili, Han, Xiao

arXiv.org Artificial IntelligenceNov-28-2022

Federated learning seeks to address the issue of isolated data islands by making clients disclose only their local training models. However, it was demonstrated that private information could still be inferred by analyzing local model parameters, such as deep neural network model weights. Recently, differential privacy has been applied to federated learning to protect data privacy, but the noise added may degrade the learning performance much. Typically, in previous work, training parameters were clipped equally and noises were added uniformly. The heterogeneity and convergence of training parameters were simply not considered. In this paper, we propose a differentially private scheme for federated learning with adaptive noise (Adap DP-FL). Specifically, due to the gradient heterogeneity, we conduct adaptive gradient clipping for different clients and different rounds; due to the gradient convergence, we add decreasing noises accordingly. Extensive experiments on real-world datasets demonstrate that our Adap DP-FL outperforms previous methods significantly.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.15893

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AEGD: Adaptive Gradient Decent with Energy

Liu, Hailiang, Tian, Xuping

arXiv.org Machine LearningOct-10-2020

In this paper, we propose AEGD, a new algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive updates of quadratic energy. As long as an objective function is bounded from below, AEGD can be applied, and it is shown to be unconditionally energy stable, irrespective of the step size. In addition, AEGD enjoys tight convergence rates, yet allows a large step size. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for various optimization problems: it is robust with respect to initial data, capable of making rapid initial progress, shows comparable and most times better generalization performance than SGD with momentum for deep neural networks. The implementation of the algorithm can be found at https://github.com/txping/AEGD.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2010.05109

Country:

North America > United States > Iowa > Story County > Ames (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Training Deep Neural Networks Without Batch Normalization

Gaur, Divya, Folz, Joachim, Dengel, Andreas

arXiv.org Machine LearningAug-18-2020

Training neural networks is an optimization problem, and finding a decent set of parameters through gradient descent can be a difficult task. A host of techniques has been developed to aid this process before and during the training phase. One of the most important and widely used class of method is normalization. It is generally favorable for neurons to receive inputs that are distributed with zero mean and unit variance, so we use statistics about dataset to normalize them before the first layer. However, this property cannot be guaranteed for the intermediate activations inside the network. A widely used method to enforce this property inside the network is batch normalization. It was developed to combat covariate shift inside networks. Empirically it is known to work, but there is a lack of theoretical understanding about its effectiveness and potential drawbacks it might have when used in practice. This work studies batch normalization in detail, while comparing it with other methods such as weight normalization, gradient clipping and dropout. The main purpose of this work is to determine if it is possible to train networks effectively when batch normalization is removed through adaption of the training process.

artificial intelligence, machine learning, non-bn network, (15 more...)

arXiv.org Machine Learning

2008.0797

Country: Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback